AITopics | state distribution

Predictive Preference Learning from Human Interventions

Neural Information Processing SystemsJun-17-2026, 05:32:19 GMT

Learning from human involvement aims to incorporate the human subject to monitor and correct agent behavior errors. Although most interactive imitation learning methods focus on correcting the agent's action at the current state, they do not adjust its actions in future states, which may be potentially more hazardous. To address this, we introduce Predictive Preference Learning from Human Interventions (PPL), which leverages the implicit preference signals contained in human interventions to inform predictions of future rollouts. The key idea of PPL is to bootstrap each human intervention into Lfuture time steps, called the preference horizon, with the assumption that the agent follows the same action and the human makes the same intervention in the preference horizon. By applying preference optimization on these future states, expert corrections are propagated into the safety-critical regions where the agent is expected to explore, significantly improving learning efficiency and reducing human demonstrations needed. We evaluate our approach with experiments on both autonomous driving and robotic manipulation benchmarks and demonstrate its efficiency and generality.

agent, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Education (1.00)
Leisure & Entertainment > Games > Computer Games (0.67)
Information Technology (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Pre-trained Large Language Models Learn to Predict Hidden Markov Models In-context

Neural Information Processing SystemsJun-15-2026, 20:13:16 GMT

Hidden Markov Models (HMMs) are foundational tools for modeling sequential data with latent Markovian structure, yet fitting them to real-world data remains computationally challenging. In this work, we show that pre-trained large language models (LLMs) can effectively model data generated by HMMs via in-context learning (ICL)--their ability to infer patterns from examples within a prompt. On a diverse set of synthetic HMMs, LLMs achieve predictive accuracy approaching the theoretical optimum. We uncover novel scaling trends influenced by HMM properties, and offer theoretical conjectures for these empirical observations. We also provide practical guidelines for scientists on using ICL as a diagnostic tool for complex data. On real-world animal decision-making tasks, ICL achieves competitive performance with models designed by human experts. To our knowledge, this is the first demonstration that ICL can learn to predict HMM-generated sequences--an advance that deepens our understanding of in-context learning in LLMs and establishes its potential as a powerful tool for uncovering hidden structure in complex scientific data.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (0.46)
Information Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

sup

Neural Information Processing SystemsApr-25-2026, 07:43:39 GMT

A.1 Notation In this appendix, we use the notation dπt(,) to indicate the state-action visitation measure induced by the policy π at time t. We overload the notation dπt() to denote the state-visitation measure induced by the policy π at time t. Likewise, the notations dDt (,) and dDt () indicate the empirical visitation measures in the dataset D. For a function g: X R, the norm kgk, supx X |g(x)|. Before discussing the proofs of the results, we also explain the instantiation of the function class in the tabular setting below. A.2 Imitation gap upper bound on empirical moment matching (Theorem 3.1) Below we restate Theorem 3.1 and provide a proof of this result. The key observation is that since the learner πMM best matches the empirical distribution in the dataset, which is in turn close to the population visitation measure induced by πE, we can expect the visitation measure induced by πE and πMM to be close. This in turns implies that both policies will collect a similar value under any reward function. Precisely characterizing the rates at which these distributions converge to one another results in the final bound. Consider the empirical moment matching learner πMM (eq. TV dπt,dDt (20) where the equation follows by the variational definition of the total variation distance, and where dπt is the state-action visitation measure induced by πE and dDt is the empirical state-action visitation measure in the dataset D. The imitation gap of this policy can be upper bounded by, J(πE) J(πMM) = EπE "H This goes to show that in the tabular setting, MMis equivalent to finding the policy which best matches (in TV-distance) the empirical state-action distribution observed in the dataset.

artificial intelligence, machine learning, nexp, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

248024541dbda1d3fd75fe49d1a4df4d-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 03:47:08 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.68)
(2 more...)

Add feedback

1cb5b3d64bdf3c6642c8d9a8fbecd019-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 23:11:05 GMT

artificial intelligence, machine learning, objective, (12 more...)

Neural Information Processing Systems

Industry:

Education (0.68)
Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.94)
Information Technology > Artificial Intelligence > Robots (0.69)

Add feedback

On the Value of Interaction and Function Approximation in Imitation Learning

Neural Information Processing SystemsApr-24-2026, 14:56:03 GMT

We study the statistical guarantees for the Imitation Learning (IL) problem in episodic MDPs.

learner, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County (0.28)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.41)

Add feedback

Data-Efficient Non-Gaussian Semi-Nonparametric Density Estimation for Nonlinear Dynamical Systems

Liao, Aaron R., Oguri, Kenshiro, Carpenter, Michele D.

arXiv.org Machine LearningApr-13-2026

Accurate representation of non-Gaussian distributions of quantities of interest in nonlinear dynamical systems is critical for estimation, control, and decision-making, but can be challenging when forward propagations are expensive to carry out. This paper presents an approach for estimating probability density functions of states evolving under nonlinear dynamics using Seminonparametric (SNP), or Gallant-Nychka, densities. SNP densities employ a probabilists' Hermite polynomial basis to model non-Gaussian behavior and are positive everywhere on the support by construction. We use Monte Carlo to approximate the expectation integrals that arise in the maximum likelihood estimation of SNP coefficients, and introduce a convex relaxation to generate effective initial estimates. The method is demonstrated on density and quantile estimation for the chaotic Lorenz system. The results demonstrate that the proposed method can accurately capture non-Gaussian density structure and compute quantiles using significantly fewer samples than raw Monte Carlo sampling.

artificial intelligence, machine learning, snp density, (15 more...)

arXiv.org Machine Learning

2604.09375

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.70)

Add feedback

c556da88a2665e6266453d8c9b8a552d-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 01:20:03 GMT

artificial intelligence, machine learning, optimization problem, (18 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

ExplorationbyLearningDiverseSkillsthrough SuccessorStateRepresentations

Neural Information Processing SystemsFeb-16-2026, 10:56:15 GMT

The ability to perform different skills can encourage agents to explore. In this work, we aim to construct a set of diverse skills that uniformly cover the state space.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Add feedback

Recovering from Out-of-sample States via Inverse Dynamics in Offline Reinforcement Learning

Neural Information Processing SystemsFeb-15-2026, 05:31:14 GMT

However, such pessimism for out-of-sample data could be too restricted and sample inefficient, as not all out-of-sample(unseen) states are not generalizable [20].

inverse dynamic model, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: